13 research outputs found

    Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

    Get PDF
    Important legacy paper documents are digitized and collected in online accessible archives. This enables the preservation, sharing, and significantly the searching of these documents. The text contents of these document images can be transcribed automatically using OCR systems and then stored in an information retrieval system. However, OCR systems make errors in character recognition which have previously been shown to impact on document retrieval behaviour. In particular relevance feedback query-expansion methods, which are often effective for improving electronic text retrieval, are observed to be less reliable for retrieval of scanned document images. Our experimental examination of the effects of character recognition errors on an ad hoc OCR retrieval task demonstrates that, while baseline information retrieval can remain relatively unaffected by transcription errors, relevance feedback via query expansion becomes highly unstable. This paper examines the reason for this behaviour, and introduces novel modifications to standard relevance feedback methods. These methods are shown experimentally to improve the effectiveness of relevance feedback for errorful OCR transcriptions. The new methods combine similar recognised character strings based on term collection frequency and a string edit-distance measure. The techniques are domain independent and make no use of external resources such as dictionaries or training data

    DCU at CLEF 2006: Robust cross language track

    Get PDF
    The main focus of the DCU group’s participation in the CLEF 2006 Robust Track in CLEF 2006 was not to identify and handle difficult topics in the topic set per se, but rather to explore a new method of re-ranking a retrieved document set. The initial query is used to re-rank documents retrieved using a query expansion method. The intention is to ensure that the query drift that might occur as a result of the addition of expansion terms chosen from irrelevant documents in pseudo relevance feedback (PRF) is minimised. By re-ranking using the initial query, the relevant set is forced to mimic the initial query more closely while not removing the benefits of PRF. Our results show that although our PRF is consistently effective for this task, the application of our re-ranking method generally has little effect on the ranked output

    Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech

    Get PDF
    The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically related metadata. An important question is what can be expected from search of such transcriptions and different sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set

    Dublin City University at CLEF 2004: experiments in monolingual, bilingual and multilingual retrieval

    Get PDF
    The Dublin City University group participated in the monolingual, bilingual and multilingual retrieval tasks this year. The main focus of our investigation this year was extending our retrieval system to document languages other than English, and completing the multilingual task comprising four languages: English, French, Russian and Finnish. Results from our French monolingual experiments indicate that working in French is more effective for retrieval than adopting document and topic translation to English. However, comparison of our multilingual retrieval results using different topic and document translation reveals that this result does not extend to retrieved list merging for the multilingual task in a simple predictable way

    Dublin City University at CLEF 2004: experiments with the ImageCLEF St Andrew's collection

    Get PDF
    For the CLEF 2004 ImageCLEF St Andrew's Collection task the Dublin City University group carried out three sets of experiments: standard cross-language information retrieval (CLIR) runs using topic translation via machine translation (MT), combination of this run with image matching results from the VIPER system, and a novel document rescoring approach based on automatic MT evaluation metrics. Our standard MT-based CLIR works well on this task. Encouragingly combination with image matching lists is also observed to produce small positive changes in the retrieval output. However, rescoring using the MT evaluation metrics in their current form significantly reduced retrieval effectiveness

    Negociações de significados sobre aspectos do raciocínio proporcional e identidade profissional da cop - paem

    Get PDF
    Nesse artigo apresentamos resultados parciais de uma pesquisa em desenvolvimento no contexto de uma comunidade de prática, formada por pesquisadores e professores de matemática da educação básica, que busca evidenciar aprendizagens e elementos desse contexto que colaboram para o desenvolvimento da identidade profissional do professor. Nossas considerações resultam da análise deum episódio, parte de uma ação desenvolvida pelos membros da comunidade, relacionado ao empreendimento estudo do Raciocínio Proporcional, em que um dos participantes propõe aos demais um problema com potencial para mobilizar o raciocínio relativo, analisa as estratégias de resolução dos demais, aponta e justifica evidências dessa mobilização quando ocorrem. A análise das transcrições dos áudios gravados nos encontros semanais dos membros da comunidade e dos registros escritos dos participantes evidenciou negociações de significado a respeito de conhecimentos profissionais do professor, da visão de si e da profissão de professor; e aspectos de agência e vulnerabilidade do participante que propõe a tarefa, ao se colocar em uma posição mais central na comunidade. Essas evidências indicam aprendizagens dos participantes que nos permitem inferir que ações dessa natureza no contexto de formação docente podem colaborar para o desenvolvimento da identidade profissional do professor
    corecore